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Abstract.  Support  vector  data  description  (SVDD)  is  a  well-known  ker¬ 
nel  method  that  constructs  a  minimal  hypersphere  regarded  as  a  data 
description  for  a  given  data  set.  However  SVDD  does  not  take  into 
account  any  statistical  distribution  of  the  data  set  in  constructing  that 
optimal  hypersphere,  and  SVDD  is  applied  to  solving  one-class  classi¬ 
fication  problems  only.  This  paper  proposes  a  new  approach  to  SVDD 
to  address  those  limitations.  We  formulate  an  optimisation  problem  for 
binary  classification  in  which  we  construct  two  hyperspheres,  one  enclos¬ 
ing  positive  samples  and  the  other  enclosing  negative  samples,  and  during 
the  optimisation  process  we  move  the  two  hyperspheres  apart  to  max¬ 
imise  the  margin  between  them  while  the  data  samples  of  each  class 
are  still  inside  their  own  hyperspheres.  Experimental  results  show  good 
performance  for  the  proposed  method. 
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1  Introduction 

Support  vector  data  description  (SVDD)  [1]  was  proposed  by  Tax  and  Duin 
to  train  a  hyperspherically  shaped  boundary  around  a  normal  dataset  while 
keeping  all  abnormal  data  samples  outside  the  hypersphere.  This  SVDD  has 
been  a  successful  approach  to  solving  one-class  problems  such  as  outlier  detection 
since  the  volume  of  this  data  description  is  kept  minimal.  One-class  support 
vector  machine  (OC-SVM)  [2]  is  a  similar  approach  proposed  earlier  to  estimate 
the  support  of  a  high-dimensional  distribution.  Although  this  method  uses  a 
maximal-margin  hyperplane  instead  of  a  hypersphere  to  separate  the  normal 
data  from  the  abnormal  data,  it  has  the  same  optimisation  problem  as  SVDD. 
In  both  OC-SVM  and  SVDD,  the  boundary  in  the  feature  space  when  mapped 
back  to  the  input  space  can  produce  a  complex  and  tight  description  of  the  data 
distribution. 

There  are  various  extensions  to  SVDD.  A  small  hypersphere  and  large  margin 
approach  was  proposed  in  [3]  for  novelty  detection  problems  where  a  minimal 
hypersphere  was  trained  to  include  most  of  normal  examples  while  the  margin 
between  the  hypersphere  and  outliers  is  as  large  as  possible.  A  further  extension 
using  two  large  margins  instead  of  one  was  proposed  in  [4],  where  an  interior 
margin  between  the  hypersphere  and  the  normal  data  and  an  exterior  margin 
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between  the  hypersphere  and  the  abnormal  data  both  are  maximised.  In  [5], 
the  authors  define  an  optimisation  problem  as  maximising  the  separation  ratio 
(R  +  d)/(R—  d),  where  R  is  the  hypersphere’s  radius  and  d  is  the  hypersphere’s 
margin.  It  is  shown  to  be  equivalent  to  minimising  ( R 2  —  kd2)  where  k  is  a 
parameter  to  adjust  between  minimising  R  and  maximising  d.  Hao  et  al.  [6]  also 
used  a  similar  formulation  in  which  several  similarity  functions  were  used  to 
compute  the  distance  to  centres.  Another  extension  of  SVDD  is  [7]  in  which  the 
use  of  two  SVDDs  for  the  description  of  data  with  two  classes  was  proposed. 

However  all  of  those  models  are  for  one-class  problems  in  which  the  task  is  to 
provide  a  tight  data  description  or  to  detect  outliers.  When  applying  to  a  two- 
class  problem  where  the  numbers  of  data  samples  of  two  classes  are  not  much 
different,  the  boundary  of  one-class  methods  is  inappropriate.  To  overcome  this 
problem,  the  first  straight  forward  approach  is  to  train  two  SVDDs,  one  for  each 
class  and  define  the  decision  boundary  as  the  bisector  between  two  surfaces  of 
the  hyperspheres.  Although  this  approach  improves  the  performance  of  one-class 
methods  for  two-class  problems,  they  are  limited  by  the  small-sphere  constraint 
of  the  data  description. 

In  this  paper,  we  propose  a  method  using  two  SVDDs,  one  enclosing  posi¬ 
tive  samples  and  the  other  enclosing  negative  samples,  for  binary  classification 
tasks.  The  minimum  bounding  hypersphere  constraint  is  relaxed  to  allow  the 
hyperspheres  to  acquire  larger  regions.  This  is  achieved  by  imposing  a  criterion 
that  maximises  the  distance  between  two  hyperspheres  while  still  keeping  the 
data  inside  the  spheres.  A  margin  variable  is  added  to  the  optimisation  to  fur¬ 
ther  improve  the  classification  boundary.  Since  the  proposed  method  trains  two 
SVDDs  that  repel  each  other,  we  call  it  repulsive-SVDD  classification  (RSVC). 
RSVC  decision  boundary  can  be  considered  as  a  compromise  between  the  bound¬ 
ary  of  a  SVM  boundary  and  a  bisector  boundary  of  two  SVDDs’  surfaces,  this 
is  controlled  by  a  trade  off  parameter  to  adjust  the  balance  between  describing 
the  data  and  maximising  the  distance  between  the  two  sphere  centres. 

The  rest  of  the  paper  is  organized  as  follows.  The  theory  of  the  proposed 
RSVC  will  be  presented  in  Section  2.  Comparison  of  RSVC  with  Two  SVDDs 
will  be  discussed  in  Section  3.  Experimental  results  are  presented  to  show  the 
performance  of  the  proposed  method  in  Section  4.  Finally,  Section  5  presents 
our  conclusions. 


2  Proposed  Approach:  Repulsive-SVDD  Classification 
(RSVC) 

To  apply  SVDD  for  binary  classification  problems,  we  construct  a  hypersphere 
for  each  class  to  describe  its  data  distribution  with  additional  properties  to  dis¬ 
criminate  the  two  classes.  First,  the  hypersphere  constraint  in  SVDD  is  relaxed 
to  allow  this  hypersphere  to  acquire  a  larger  area  that  is  far  from  the  other  class. 
This  is  achieved  by  imposing  a  criterion  that  maximises  the  distance  between  two 
hyperspheres  while  still  keeping  all  data  samples  of  a  class  inside  its  hypersphere. 
Second,  the  margin  (i.e. ,  the  distance  between  surfaces  of  the  two  hypersphere) 
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is  maximised,  similar  to  the  maximal  margin  philosophy  of  a  support  vector 
machine. 

A  visualisation  of  RSVC  is  demonstrated  in  Fig.  1.  In  the  left  figure,  SVM 
determines  a  maximum  margin  hyperplane  without  considering  data  distribu¬ 
tions  of  positive  and  negative  classes.  Whereas  in  the  middle  figure,  SVDDs 
determine  two  minimal  hyperspheres  without  considering  the  margin  between 
the  two  classes,  and  the  decision  boundary  is  the  perpendicular  hyperplane  of 
the  line  segment  connecting  the  two  hypersphere  centres. 

By  contrast,  our  RSVC  can  provide  an  intermediate  solution  between  SVM 
and  SVDDs.  Given  the  problem  in  Fig.  1,  the  RSVC  optimisation  problem 
attempts  to  keep  the  radii  minimum  while  maximising  the  distance  between 
the  two  hyperspheres.  As  a  result,  the  hyperspheres  will  expand  in  the  direction 
that  increases  the  distance  between  the  two  hyperspheres.  Moreover,  the  weights 
of  these  two  directions  can  be  controlled  by  a  parameter. 


+ 

+ 

+ 


(SVM) 


(SVDDs) 


Fig.  1.  SVM  (left  figure)  determines  a  maximum  margin  hyperplane  without  consider¬ 
ing  data  distributions  of  positive  and  negative  classes.  SVDDs  (middle  figure)  determine 
minimum  hyperspheres  without  considering  the  margin  between  two  classes.  RSVC 
(right  figure)  determines  two  minimal  hyperspheres,  one  enclosing  positive  samples 
and  the  other  enclosing  negative  samples,  while  maximising  the  distance  between  two 
centres  to  a  degree  controlled  by  a  parameter. 


2.1  Problem  Formulation 

Consider  a  dataset  {xi},  i  =  1, . . . ,  n  with  two  classes,  positive  class  with  n\  data 
samples  and  negative  class  with  n,2  data  samples,  n\  +  ri2  =  n.  The  problem 
of  RSVC  is  to  determine  two  optimal  hyperspheres  (ai,i?i)  and  (02,1^2),  one 
encloses  data  samples  of  the  positive  class  and  the  other  encloses  data  samples 
of  the  negative  class,  and  at  the  same  time  maximise  the  distance  between  the 
two  centres.  In  addition,  all  positive  and  negative  data  samples  are  forced  to 
stay  outside  the  margin  pi  and  P2  of  the  positive  hypersphere  and  the  margin  of 
the  negative  hypersphere  respectively.  The  optimisation  problem  is  formulated 
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as  follows: 
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where  k  is  a  parameter  which  represents  the  repulsive  degree  between  two  cen¬ 
tres,  pi  and  p 2  are  two  parameters  controlling  the  support  vectors,  and  </>  is  the 
mapping  to  transform  the  vector  Xi  to  a  feature  space. 

The  above  problem  is  for  separable  datasets.  In  practice,  to  allow  errors,  the 
constraints  are  relaxed  by  introducing  slack  variables  £i«  and  £2 i,  and  penal¬ 
ized  terms  are  added  to  its  objective  function.  In  addition,  if  we  combine  the 
constraints  in  this  problem  to  have  a  simpler  form,  the  optimisation  problem 
becomes: 
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where  i^i  and  v2  are  parameters  controlling  the  number  of  support  vectors, 
together  with  pi  and  p2.  They  will  be  explained  in  Proposition  1  below. 

2.2  Convex  Formulation  of  RSVC 

Although  the  optimisation  in  (7)  has  a  non-convex  objective  function,  it  can  be 
reformulated  to  have  a  convex  form  as  follows: 
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Let  <5i  =  a2  —  Rl,52  =  a^  —  R2  and  0  <  60  <  ||oi  —  <3,2 1 12,  (12)  becomes 
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We  can  construct  the  Lagrange  function  below  using  these  following  Lagrange 
multipliers  an ,  a2i ,  7 u ,  j2i ,  Pi ,  02 ,  /?,  A: 
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Equations  (28)  and  (29)  leads  to 
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By  substituting  the  KKT  conditions  into  the  Lagrangian  function  we  obtain 
the  dual  form  of  the  optimisation: 
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where  the  inner  product  between  vectors  has  been  replaced  by  the  kernel  K, 
and  the  Lagrange  multipliers  71 ,  >  0,72/  >  0, 0i  >  0, 02  >  0,  A  >  0  have  been 
removed  from  Equations  (30),  (31),  (32),  (33)  and  (27)  respectively.  Similarly  to 
1/-SVC,  ai i  is  set  to  [i  1,  J2i  a2/  is  set  to  y2  and  (3  is  set  to  k,  where  k  is  a 

parameter  chosen  in  the  range  k  £  [0,  ^). 

It  can  be  seen  that  if  k  is  set  to  0  in  the  above  optimisation  problem  then  the 
RSVC  optimisation  problem  (35)  can  be  broken  into  two  independent  optimi¬ 
sation  problems  similar  to  SVDDs  except  for  the  extra  constraints  JV  an  =  y\ 
and  a2i  =  02  resulting  from  the  margin  requirements  in  the  original  RSVC 
problem  (1). 

Solving  the  problem  (35)  gives  a  set  of  an,a2i-  Then  the  centres  01,02  can 
be  determined  from  Equations  (34). 

To  determine  the  radius  the  support  vector  Xt  that  lies  on  the  surface 
of  the  hypersphere  (oi,i?i)  and  corresponds  to  the  smallest  a\t  €  (0,  ^T_)  is 
selected.  Then  the  radius  R\  is  calculated  as  R\  =  di(xt),  where  di(xt)  is  the 
distance  from  Xt  to  the  centre  ai  and  is  determined  as  follows: 
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The  radius  R2  is  calculated  similarly: 


d\(xt)  =  || 4>{xt)  -  a2 1|1  2  =K(xt,xt )  -  _  [  -  k^iyiK(xt,Xi) 

v-  *  (40) 
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In  the  test  phase,  a  sample  x  can  be  determined  whether  it  belongs  to  the 
hypersphere  (ai,  R\)  or  (a2,  R2),  he.  class  +1  or  class  -1,  by  the  following  decision 
function: 

sign(d\(x)  -  d\(x))  (41) 


2.3  i/-Property 

Following  [8],  a  data  sample  Xi  is  called  a  support  vector  if  it  has  Lagrange 
multiplier  q;  >  0;  a  data  sample  is  called  a  margin  error  if  it  has  positive  slack 
variable  ^  >  0. 

Similarly  to  the  property  of  the  v  parameter  in  ^-SVC  [8],  we  derive  the 
property  for  the  i/i,i/2,/ii  and  /x2  parameters  and  use  it  for  parameter  selection 
to  train  the  RSVC. 


Proposition  1.  Let  m\  and  m2  denote  the  number  of  margin  errors  of  the 
positive  sphere  and  negative  sphere  respectively,  and  let  si  and  s2  denote  their 
numbers  of  support  vectors.  Then  for  parameters  vi,i^2,pi  and  p2  we  have: 


1.  p.\Vi  and  g2v2  are  upper  bounds  on  the  fraction  of  margin  errors,  and  a  lower 
bound  on  the  fraction  of  support  vectors  for  the  positive  sphere  and  negative 
sphere  respectively: 
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2.  The  feasible  ranges  of  v i,v2,gi  and  p2  are: 

0  <  v\  <  1  ,  1  <  /xi  <  —  and  0<r,2<l,  l</r2<  —  (43) 

v1  v2 

Proof.  We  first  prove  for  the  positive  hypersphere. 

1.  By  the  KKT  conditions,  all  data  points  with  £ij  >  0  imply  71  j  =  0.  From  (30) 
we  have  the  equation  au  =  l/(iqni)  holds  for  every  margin  error.  Summing 
up  an  and  using  ]Tb  an  =  g\  from  (37)  we  have: 
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On  the  other  hand,  (38)  indicates  that  each  support  vector  of  the  positive 
hypersphere  can  get  at  most  l/(zqni).  Therefore  summing  up  an  for  support 
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vectors  of  positive  hypersphere,  plus  an  =  0  for  non-support  vectors,  and 
from  (37)  we  have: 

— >  y  an  =  pi  (45) 

vn\ 

i 

Combining  (44)  and  (45)  we  have  the  inequalities  (42)  for  the  positive  hyper¬ 
sphere. 

2.  From  (42)  we  have  0  <  <  1.  In  addition,  from  (36)  we  have  ]T)  anUi  =  1, 
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Combining  these  results  we  have  the  proof  of  (43) . 

The  proof  of  inequalities  (42)  and  (43)  for  the  negative  hypersphere  is  similar. 

The  proposed  RSVC  is  for  binary  classification  problems.  It  can  be  extended 
for  multi-class  classification  problems  by  using  “one-against-the  rest”  approach 
or  “one-against-one”  approach.  Following  [9],  we  use  the  one-against-one  app¬ 
roach  in  this  paper  where  data  of  every  pair  of  classes  are  used  to  train  a  binary 
classifier  that  separates  the  two  classes,  resulting  in  M(M  —  l)/2  classifiers  in 
a  M-class  classification  problem.  In  the  test  phase,  a  voting  strategy  is  used: 
each  binary  classification  of  a  test  sample  generates  a  vote,  and  the  class  with 
the  maximum  number  of  votes  for  this  test  data  sample  is  output  as  the  overall 
classification  result.  In  case  that  two  classes  have  identical  votes,  one  can  simply 
choose  the  class  appearing  first  in  the  array  of  storing  class  names  as  in  [9] . 


3  Comparison  of  RSVC  with  Two  SVDDs 


SVDD  can  be  extended  to  two  SVDDs  to  describe  a  data  set  of  two  classes. 
Consider  a  data  set  {x^,  i  =  1, . . . ,  n  of  two  classes,  positive  class  with  m  data 
samples  and  negative  class  with  712  data  samples,  n\  +  712  =  n.  The  optimisation 
problem  is  formulated  as  follows  [7]: 


min 

Rl  ,i?2  ?al  ,0*2  ,£li  ,£2 i 


S.t. 


rI  +  r22  + 


1 


|| Xi  —  <21 1 1 2  <  R\  +  £1  i, 
||  Xi  -  ai||2  >R\-  £u, 
H^i  —  a2|h  <  -^2  +  £21; 
II Xi  -  a2\\2  >  Rl  -  &»! 

£w><U2i>0  V* 


—  £&< 

i 

Vi,  yi  =  +1 
Vi, yi  =  -1 
Vi, yi  =  -1 
Vi,yi  =  +1 


(46) 


where  (ai,f?i)  and  (a2,R2)  are  two  hyperspheres,  v2  are  parameters. 

This  optimisation  can  produce  a  description  of  two  minimal  hyperspheres 
enclosing  two  classes.  The  decision  boundary  can  be  defined  as  the  bisector 
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between  their  surfaces.  However  this  model  is  for  one-class  problems  in  which 
the  task  is  to  provide  a  tight  data  description  or  to  detect  outliers.  When  applying 
to  a  two-class  problem  where  the  data  samples  of  two  classes  are  balance  the 
boundary  of  one-class  methods  is  inappropriate.  The  RSVC  can  overcome  this 
problem  by  allowing  hyperspheres  to  acquire  a  larger  area  by  minimising  —  fc|  |ai  — 
a2 1 1 2  and  creating  a  larger  margin  by  minimising  —pipi  —  P2P2  while  still  trying 
to  provide  data  description  for  two  classes. 

4  Experiments 

4.1  2-D  Demonstration  of  RSVC 

Figure  2  shows  visual  results  for  experiments  performed  on  a  simple  2-D  datasets 
using  RSVC.  When  parameter  k  =  0,  the  RSVC  optimisation  function  becomes 
the  optimisation  function  for  two  SVDDs,  hence  two  SVDDs  is  a  special  case  of 
RSVC.  It  can  be  seen  that  when  k  increases,  two  hyperspheres  repulsed  each 
other,  resulting  in  a  larger  margin  in  between.  Those  data  samples  outside 
the  hyperspheres  but  inside  this  margin  are  penalised  by  a  cost  proportional 
to  1  / or  1/(^2712).  The  decision  boundary  is  the  bisector  between  the 
hyperspheres’  surfaces.  The  first  row  in  Figure  2  shows  that  when  parameter  k 
increases,  the  hypersphere  enclosing  positive  samples  is  moving  away  from  neg¬ 
ative  samples  while  keeping  all  the  positive  samples  inside  it.  The  second  row 
in  Figure  2  shows  that  when  p\U\  and  P2V2  increase,  more  positive  samples  are 
outside  the  hyperspheres. 

Classification  experiments  were  conducted  on  9  UCI  datasets1.  Details  of 
these  datasets  are  listed  in  Table  1 .  The  datasets  were  divided  in  to  2  subsets,  the 
subset  contained  50%  of  the  data  is  for  training  and  the  other  50%  for  testing. 
The  training  process  was  done  using  5-fold  cross  validation.  The  parameters  for 
the  methods  are  as  follows.  Gaussian  mixture  models  (GMM)  [10]  use  64  mix¬ 
ture  components.  OC-SVM  parameters  are  searched  in  7  £  {2-13, 2-11, . . . ,  21} 
and  v  £  {2-5, 2-4, . . . ,  2~2}.  Parameters  of  SVDD  and  SVDD  with  nega¬ 
tive  examples  (Two  SVDDs)  are  searched  in  7  £  {2-13,  2-11, . . . ,  21}  and 
v  £  {2-5, 2-4, . . . ,  2-2}.  SVM  parameters  are  search  in  7  £  {2-13, 2-11, . . . ,  21} 
and  C  £  {2_1, 23, . . . ,  215};  and  RSVC  parameters  are  searched  in  7  £ 
{2-7, 2-5, . . . ,  2”1},  Vl  =  v2  £  {0.001,0.01},  pi  =  p2  £  {10, 30, . . . ,  90},  and 
k  £  {0.5,  0.7,  0.9}. 

Note  that  the  parameter  7  in  RSVC  is  searched  in  a  narrower  range  than 
that  in  SVM,  while  v\ni  and  ^2^2  are  searched  in  a  roughly  similar  number 
of  options  as  of  parameter  C .  This  is  to  produce  a  sparse  number  of  support 
vectors  and  avoid  over  fitting  of  the  two  SVDDs.  Parameter  k  £  {0.5, 0.7,  0.9} 
is  to  favour  classification  more  than  tight  description.  After  the  best  parameters 
are  selected  in  the  cross  validation  step,  the  models  are  trained  again  with  them 
on  the  whole  training  set  and  are  tested  on  the  50%  unseen  test  set.  Experiments 

1  Available  online  at  http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/ 
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Fig.  2.  The  first  row  contains  screenshots  for  RSVC  when  k  =  0,  0.3  and  0.6,  and 
/ iiVi  =  /i2^2  =  0.2.  The  second  row  contains  screenshots  for  RSVC  when  = 

I12V2  =  0.1,  0.2  and  0.5,  and  k  =  0.9.  A  Gaussian  RBF  kernel  was  used,  with  7  =  5. 
Red  points  are  positive  samples  and  blue  points  are  negative  samples. 


were  repeated  10  times  and  the  results  were  averaged  with  standard  deviations 
given. 

Table  2  shows  the  prediction  rates  in  cross  validation  training.  Table  3  shows 
the  prediction  rates  on  unseen  test  sets  with  best  parameters  selected. 

It  can  be  seen  that  the  GMM,  OCSVM  and  SVDD  have  undesirable  perfor¬ 
mance  in  the  classification  task. 

The  two  SVDDs  have  much  higher  performance  than  these  one-class  meth¬ 
ods  since  they  describe  two  minimal  hyperspheres  enclosing  two  classes  and  the 


Table  1.  Dataset  information:  number  of  classes,  dataset  size  and  number  of  features 


Data  set 

#class 

size 

^feature 

Fourclass 

2 

862 

2 

Liver  disorders 

2 

345 

6 

Heart 

2 

270 

13 

Wine 

3 

178 

13 

Breast  Cancer 

2 

683 

10 

Diabetes 

2 

768 

8 

Australian 

2 

690 

14 

Ionosphere 

2 

351 

34 

German  numer 

2 

1000 

24 
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decision  boundary  is  the  bisector  between  their  surfaces.  It  can  be  seen  that 
SVM  has  higher  performance  than  two  SVDDs,  it  trains  a  maximal-margin  sep¬ 
arating  hyperplane  rather  than  two  minimal  hyperspheres.  RSVC  show  highest 
performance  in  most  datasets.  RSVC  can  overcome  the  limitation  of  two  SVDDs 
for  the  classification  task  by  training  two  SVDDs  that  repel  each  other,  allowing 
spheres  to  acquire  a  larger  area  and  creating  a  larger  margin  while  still  trying 
to  provide  data  description  for  two  classes. 


Table  2.  Prediction  rates  in  cross  validation  training  of  classification  methods 


Dataset 

GMM 

OCSVM 

SVDD 

Two  SVDDs 

SVM 

RSVC 

Fourclass 

67.24 

±5.73 

62.15 

±3.3 

54.01 

±4.97 

71.44 

±4.15 

75.08 

±4.4 

77.98 

±4.52 

Liver  disorders 

40.86 

±5.63 

50.41 

±5.69 

55.41 

±5.87 

55.15 

±5.49 

59.90 

±3.53 

60.14 

±5.56 

Heart 

46.33 

±4.26 

60.24 

±5.89 

46.41 

±4.24 

61.11 

±5.98 

72.56 

±4.24 

76.44 

±4.45 

Wine 

33.43 

±5.03 

55.57 

±4.07 

46.43 

±5.8 

59.89 

±3.7 

75.24 

±5.56 

83.15 

±4.59 

Breast  cancer 

56.52 

±3.34 

73.85 

±4.11 

62.91 

±4.15 

77.16 

±4.08 

81.29 

±4.44 

81.49 

±4.46 

Diabetes 

55.24 

±5.06 

51.84 

±3.99 

40.24 

±5.16 

50.95 

±5.63 

63.47 

±5.93 

66.87 

±3.28 

Australian 

54.15 

±5.84 

58.36 

±5.52 

48.23 

±5.16 

61.03 

±5.75 

70.96 

±3.53 

71.90 

±3.66 

Inosphere 

57.12 

±3.61 

65.48 

±3.85 

34.26 

±3.48 

68.92 

±4.59 

73.69 

±4.51 

75.86 

±4.29 

German  numer 

40.09 

±5.49 

58.96 

±5.39 

58.14 

±5.31 

59.65 

±5.51 

64.04 

±5.99 

65.75 

±3.35 

Table  3.  Prediction  rates  on  unseen  test  sets;  classification  methods  on  9  datasets 


Dataset 

GMM 

OCSVM 

SVDD 

Two  SVDDs 

SVM 

RSVC 

Fourclass 

67.24 

±5.73 

59.08 

±3.24 

54.44 

±5.09 

72.24 

±5.04 

70.72 

±5.64 

75.65 

±5.92 

Liver  disorders 

40.86 

±5.63 

43.48 

±4.88 

47.68 

±4.4 

50.25 

±5.15 

52.03 

±3.03 

54.12 

±5.53 

Heart 

46.33 

±4.26 

57.49 

±4.83 

46.41 

±4.24 

61.08 

±3.02 

71.51 

±4.47 

72.12 

±4.36 

Wine 

33.43 

±5.03 

42.09 

±6.99 

21.41 

±2.35 

46.46 

±5.84 

75.66 

±4.86 

76.99 

±4.69 

Breast  cancer 

56.52 

±3.34 

73.08 

±4.01 

48.34 

±7.87 

75.03 

±4.33 

79.92 

±4.34 

79.79 

±5.07 

Diabetes 

55.24 

±5.06 

55.68 

±5.34 

39.30 

±4.98 

54.10 

±5.71 

60.21 

±3.16 

59.00 

±3.81 

Australian 

54.15 

±5.84 

56.44 

±5.53 

48.38 

±5.03 

55.75 

±3.83 

69.71 

±3.42 

68.95 

±3.53 

Inosphere 

57.12 

±3.61 

62.55 

±4.71 

38.41 

±2.7 

65.79 

±5.72 

69.07 

±4.3 

70.74 

±4.63 

German  numer 

40.09 

±5.49 

58.07 

±5.47 

58.40 

±5.34 

57.46 

±5.32 

62.30 

±5.7 

63.90 

±5.67 

5  Conclusion 

We  have  proposed  the  repulsive-SVDD  classification  to  extend  SVDD  for  binary 
classification  problems.  Two  hyperspheres  are  trained  in  an  optimisation  problem 
to  describe  the  distribution  of  two  classes.  Additional  requirements  are  added  to 
the  optimisation  problem  to  help  with  the  discrimination  task.  First,  the  distance 
between  two  hypersphere  centres  is  maximised  to  allow  hyperspheres  to  expand. 
Second,  margins  between  the  hypersphere  surfaces  and  data  are  maximised.  The 
resulting  method  can  create  a  decision  boundary  that  takes  information  not  only 
from  distributions  of  the  classes  but  also  the  boundary’s  margins.  Experimental 
results  on  9  datasets  validate  the  good  performance  of  the  proposed  method. 
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